29 research outputs found

    Statistical mechanics models for biological systems: cooperativity in biochemistry and affinity maturation of antibodies

    Get PDF
    Statistical Mechanics provides useful tools and concepts to deal with collective behavior of many strongly interacting agents. Overlooking the detailed and the specific description of the interactions to focus on the very key features allows to ask different questions concerning the global systemic properties of biological systems. The information processing and statistical inference approach has became more urgent in the last decades due to the large amount of data coming from the exploit of different new experimental techniques. Concepts such as entropy, phase transition and criticality has entered the unavoidable terminology to describe the nature of biological systems at very different level of complexity: from the animal collective behaviour, the physiological apparatuses as nervous system and immune system to the biochemical processes in cells. The studies presented in this thesis are placed in this interdisciplinary border context. The thesis is divided in three main parts.The first is devoted to the more formal aspect of statistical mechanics models of spin systems. We review briefly, in the first chapter, three milestone models of spin systems: the Curie-Weiss, the Sherrington-Kirkpatrick and the Hopfield model. These models constitute the paradigmatic examples of mean-field Statistical Mechanics and will constitute the ground for the studies in biochemical kinetics and immunology presented in the following parts. In the second chapter we report a detailed study of a generalization of the Hopfield model with diluted and correlated patterns. We investigate the topology of the emergent interactions network. We find an exact expression of the coupling distribution that allows to distinguish different regimes varying the dilution parameter. Moreover we study the thermodynamic properties of the model, obtaining explicitly the replica symmetric free-energy coupled with its self-consistence equations. Considering the small overlap expansion of these self consistencies equations we get the critical surface dividing the ergodic phase to the spin-glass one. The second part of the thesis focus on the investigation of the cooperative behavior in biochemical kinetics through mean field statistical mechanics. Cooperativity is one of the most important properties of molecular interactions in biological systems as it is often invoked to account for collective features in binding phenomena. It constitutes a fundamental tool that nature developed to modulate the chemical response of biological systems to varying stimuli. Statistical mechanics offers a valuable approach as, from its first principles, it aims to figure out collective phenomena, allowing a unified and broader theory for complex chemical kinetics. In this way different cooperative behaviors, described by the related binding curves, can be analysed in an unified framework. We compare the theoretical curves predicted by the model with experimental data found in literature, finding an overall good agreement and extrapolating the values of the effective interactions between the binding sites, which can be put in direct correspondence with the standard coefficient that measure cooperativity (Hill number). Moreover, an extension of the model allows to take into account heterogeneity that can affect both the couplings between the multiple active sites (allosteric regulation) and the chemical potentials in the binding of the ligands. The last part is dedicated to a statistical inference analysis on deep sequencing data of an antibodies repertoire with the purpose of studying the process of antibodies affinity maturation. A partial antibodies repertoire from a HIV-1 infected donor presenting broadly neutralizing serum is used to infer a probability distribution in the space of sequences . The idea is to use the model to study the structure of the affinity with an antigen as a function of the antibody sequence. We test this strategy using neutralization power measurements and the deposited crystallographic structure of a deeply matured antibody. The work is still in progress, but preliminary results are encouraging and are presented here

    Collective behaviours: from biochemical kinetics to electronic circuits

    Get PDF
    In this work we aim to highlight a close analogy between cooperative behaviors in chemical kinetics and cybernetics; this is realized by using a common language for their description, that is mean-field statistical mechanics. First, we perform a one-to-one mapping between paradigmatic behaviors in chemical kinetics (i.e., non-cooperative, cooperative, ultra-sensitive, anti-cooperative) and in mean-field statistical mechanics (i.e., paramagnetic, high and low temperature ferromagnetic, anti-ferromagnetic). Interestingly, the statistical mechanics approach allows a unified, broad theory for all scenarios and, in particular, Michaelis-Menten, Hill and Adair equations are consistently recovered. This framework is then tested against experimental biological data with an overall excellent agreement. One step forward, we consistently read the whole mapping from a cybernetic perspective, highlighting deep structural analogies between the above-mentioned kinetics and fundamental bricks in electronics (i.e. operational amplifiers, flashes, flip-flops), so to build a clear bridge linking biochemical kinetics and cybernetics.Comment: 15 pages, 6 figures; to appear on Scientific Reports: Nature Publishing Grou

    Unsupervised inference of protein fitness landscape from deep mutational scan

    Get PDF
    The recent technological advances underlying the screening of large combinatorial libraries in high- throughput mutational scans, deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype-fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold

    <Notes> Production Techniques and Employment Creation in Underdeveloped Economies

    Get PDF
    The immune system has developed a number of distinct complex mechanisms to shape and control the antibody repertoire. One of these mechanisms, the affinity maturation process, works in an evolutionary-like fashion: after binding to a foreign molecule, the antibody-producing B-cells exhibit a high-frequency mutation rate in the genome region that codes for the antibody active site. Eventually, cells that produce antibodies with higher affinity for their cognate antigen are selected and clonally expanded. Here, we propose a new statistical approach based on maximum entropy modeling in which a scoring function related to the binding affinity of antibodies against a specific antigen is inferred from a sample of sequences of the immune repertoire of an individual. We use our inference strategy to infer a statistical model on a data set obtained by sequencing a fairly large portion of the immune repertoire of an HIV-1 infected patient. The Pearson correlation coefficient between our scoring function and the IC50 neutralization titer measured on 30 different antibodies of known sequence is as high as 0.77 (p-value 10-6), outperforming other sequence- and structure-based models

    Analogue neural networks on correlated random graphs

    Full text link
    We consider a generalization of the Hopfield model, where the entries of patterns are Gaussian and diluted. We focus on the high-storage regime and we investigate analytically the topological properties of the emergent network, as well as the thermodynamic properties of the model. We find that, by properly tuning the dilution in the pattern entries, the network can recover different topological regimes characterized by peculiar scalings of the average coordination number with respect to the system size. The structure is also shown to exhibit a large degree of cliquishness, even when very sparse. Moreover, we obtain explicitly the replica symmetric free energy and the self-consistency equations for the overlaps (order parameters of the theory), which turn out to be classical weighted sums of 'sub-overlaps' defined on all possible sub-graphs. Finally, a study of criticality is performed through a small-overlap expansion of the self-consistencies and through a whole fluctuation theory developed for their rescaled correlations: Both approaches show that the net effect of dilution in pattern entries is to rescale the critical noise level at which ergodicity breaks down.Comment: 34 pages, 3 figure

    Efficient generative modeling of protein sequences using simple autoregressive models

    Get PDF
    Generative models emerge as promising candidates for novel sequence-data driven approaches to protein design, and for the extraction of structural and functional information about proteins deeply hidden in rapidly growing sequence databases. Here we propose simple autoregressive models as highly accurate but computationally efficient generative sequence models. We show that they perform similarly to existing approaches based on Boltzmann machines or deep generative models, but at a substantially lower computational cost (by a factor between 10210^2 and 10310^3). Furthermore, the simple structure of our models has distinctive mathematical advantages, which translate into an improved applicability in sequence generation and evaluation. Within these models, we can easily estimate both the probability of a given sequence, and, using the model's entropy, the size of the functional sequence space related to a specific protein family. In the example of response regulators, we find a huge number of ca. 106810^{68} possible sequences, which nevertheless constitute only the astronomically small fraction 10−8010^{-80} of all amino-acid sequences of the same length. These findings illustrate the potential and the difficulty in exploring sequence space via generative sequence models.Comment: 12 pages, 4 Figures + Supplementary Materia

    AMaLa: Analysis of Directed Evolution Experiments via Annealed Mutational Approximated Landscape

    No full text
    We present Annealed Mutational approximated Landscape (AMaLa), a new method to infer fitness landscapes from Directed Evolution experiments sequencing data. Such experiments typically start from a single wild-type sequence, which undergoes Darwinian in vitro evolution via multiple rounds of mutation and selection for a target phenotype. In the last years, Directed Evolution is emerging as a powerful instrument to probe fitness landscapes under controlled experimental conditions and as a relevant testing ground to develop accurate statistical models and inference algorithms (thanks to high-throughput screening and sequencing). Fitness landscape modeling either uses the enrichment of variants abundances as input, thus requiring the observation of the same variants at different rounds or assuming the last sequenced round as being sampled from an equilibrium distribution. AMaLa aims at effectively leveraging the information encoded in the whole time evolution. To do so, while assuming statistical sampling independence between sequenced rounds, the possible trajectories in sequence space are gauged with a time-dependent statistical weight consisting of two contributions: (i) an energy term accounting for the selection process and (ii) a generalized Jukes–Cantor model for the purely mutational step. This simple scheme enables accurately describing the Directed Evolution dynamics and inferring a fitness landscape that correctly reproduces the measures of the phenotype under selection (e.g., antibiotic drug resistance), notably outperforming widely used inference strategies. In addition, we assess the reliability of AMaLa by showing how the inferred statistical model could be used to predict relevant structural properties of the wild-type sequence
    corecore